perf : Optimize count distinct using bitmaps instead of hashsets for smaller datatypes#21456
perf : Optimize count distinct using bitmaps instead of hashsets for smaller datatypes#21456coderfender wants to merge 18 commits intoapache:mainfrom
Conversation
|
benchmark results : It seems like we are 25x faster for u16 bitmap based accumulators (or I am sleepy :) ) |
|
I think we can do the same for 16 bit types, it is just 65_536 bytes 8192 if we use a bitmap. |
|
Oh wait, you're already doing that :) |
|
Query 0 in clickbench_extended dataset (which uses (Other queries are faster but I believe that is more around variance ) |
|
cc : @neilconway , @alamb , @martin-g . Please take a look whenever you get a chance |
This comment has been minimized.
This comment has been minimized.
alamb
left a comment
There was a problem hiding this comment.
This looks like a great idea. Thank you @coderfender
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
93acd98 to
7e67e2e
Compare
|
run benchmark count_distinct |
|
run benchmark clickbench |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (a13bcaa) to fbdf770 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (a13bcaa) to fbdf770 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
Recent benchmarks : ( not sure if this is due to variance with the build env ) |
|
run benchmark count_distinct |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (61dc8e1) to eaf0a41 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Okay this still seems to be an issue. Let me try and see if I can add additional hints to the compiler and see if that helps not regress existing hotpaths for i64 |
|
Hi @coderfender, thanks for the request (#21456 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas. File an issue against this benchmark runner |
|
run benchmark count_distinct |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (554f60c) to eaf0a41 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
My suspicion is that the match arm's bloat is causing regression . Trying out options to reduce the CPU cache pressure to potentially reduce code bloat / match arm bloat and see if that removes the regression |
|
Pushed a commit to move out bitmap based accumulators to a separate function with a |
|
run benchmark count_distinct |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (87a9af8) to eaf0a41 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
@alamb, @Dandandan the regression with i64 now seems to be fixed (and slightly faster) with moving bitmap accumulations inside function call with compiler hint. Please take a look whenever you get a chance |
|
@martin-g , the null checks pushed up seem to be providing mixed results . Let me try and see if there are are opportunities to avoid super minor regressions |
|
I get this on the latest version of this PR (local run on my mac)- |
|
Thank you for the approval @Dandandan :) |
|
Thank you for the approval @alamb |
Which issue does this PR close?
Remove hashset based accumulators for smaller int data types and use bitmaps. Follow up of : #21453
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?